{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Coordinate descent\n",
    "\n",
    "CuML's `lasso` and `elastic net` implementations are both able to use the coordinate descent solver. Lasso extends linear regression by providing L1 regularization and elastic net extends linear regression by providing a combination of L1 and L2 regularizers.\n",
    "\n",
    "A tremendous speed up can be demonstrated for datasets with a large number of rows and fewer columns. Furthermore, the mean squared error (MSE) value for cuML's implementation is much smaller than the Scikit-learn implementation on very small datasets."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The model can take array-like objects, either in host as NumPy arrays or in device (as Numba or cuda_array_interface-compliant), as well  as cuDF DataFrames. \n",
    "\n",
    "For information about cuDF, refer to the [cuDF documentation](https://docs.rapids.ai/api/cudf/stable/).\n",
    "\n",
    "For information about cuML's lasso implementation: https://rapidsai.github.io/projects/cuml/en/stable/api.html#cuml.Lasso.\n",
    "\n",
    "For information about cuML's elastic net implementation: https://rapidsai.github.io/projects/cuml/en/stable/api.html#cuml.ElasticNet."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import cudf\n",
    "import numpy as np\n",
    "from cuml import make_regression, train_test_split\n",
    "from cuml.linear_model import ElasticNet as cuElasticNet, Lasso as cuLasso\n",
    "from cuml.metrics.regression import r2_score\n",
    "from sklearn.linear_model import ElasticNet as skElasticNet, Lasso as skLasso"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Define Parameters"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "n_samples = 2**17\n",
    "n_features = 500\n",
    "\n",
    "learning_rate = 0.001\n",
    "algorithm = \"cyclic\"\n",
    "random_state=23"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Generate Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "X, y = make_regression(n_samples=n_samples, n_features=n_features, random_state=random_state)\n",
    "\n",
    "X = cudf.DataFrame.from_gpu_matrix(X)\n",
    "y = cudf.DataFrame.from_gpu_matrix(y)[0]\n",
    "\n",
    "X_cudf, X_cudf_test, y_cudf, y_cudf_test = train_test_split(X, y, test_size = 0.2, random_state=random_state)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Copy dataset from GPU memory to host memory.\n",
    "# This is done to later compare CPU and GPU results.\n",
    "X_train = X_cudf.to_pandas()\n",
    "X_test = X_cudf_test.to_pandas()\n",
    "y_train = y_cudf.to_pandas()\n",
    "y_test = y_cudf_test.to_pandas()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lasso"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Scikit-learn Model\n",
    "\n",
    "#### Fit, predict and evaluate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "ols_sk = skLasso(alpha=np.array([learning_rate]),\n",
    "                 fit_intercept = True,\n",
    "                 normalize = False,\n",
    "                 max_iter = 1000,\n",
    "                 selection=algorithm,\n",
    "                 tol=1e-10)\n",
    "\n",
    "ols_sk.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "predict_sk = ols_sk.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "r2_score_sk = r2_score(y_cudf_test, predict_sk)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### cuML Model\n",
    "\n",
    "#### Fit, predict and evaluate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "ols_cuml = cuLasso(alpha=np.array([learning_rate]),\n",
    "                   fit_intercept = True,\n",
    "                   normalize = False,\n",
    "                   max_iter = 1000,\n",
    "                   selection=algorithm,\n",
    "                   tol=1e-10)\n",
    "\n",
    "ols_cuml.fit(X_cudf, y_cudf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "predict_cuml = ols_cuml.predict(X_cudf_test).to_array()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "r2_score_cuml = r2_score(y_cudf_test, predict_cuml)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "print(\"R^2 score (SKL):  %s\" % r2_score_sk)\n",
    "print(\"R^2 score (cuML): %s\" % r2_score_cuml)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Elastic Net\n",
    "\n",
    "The elastic net model implemented in cuml contains the same parameters as the lasso model.\n",
    "In addition to the variable values that can be altered in lasso, elastic net has another variable who's value can be changed: `l1_ratio` decides the ratio of amount of L1 and L2 regularization that would be applied to the model. When `l1_ratio = 0`, the model will have only L2 reqularization shall be applied to the model. (default = 0.5)\n",
    "\n",
    "### Scikit-learn Model\n",
    "\n",
    "#### Fit, predict and evaluate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "elastic_sk = skElasticNet(alpha=np.array([learning_rate]),\n",
    "                          fit_intercept = True,\n",
    "                          normalize = False,\n",
    "                          max_iter = 1000,\n",
    "                          selection=algorithm,\n",
    "                          tol=1e-10)\n",
    "\n",
    "elastic_sk.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "predict_elas_sk = elastic_sk.predict(X_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "r2_score_elas_sk = r2_score(y_cudf_test, predict_elas_sk)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### cuML Model\n",
    "\n",
    "#### Fit, predict and evaluate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "elastic_cuml = cuElasticNet(alpha=np.array([learning_rate]), \n",
    "                            fit_intercept = True,\n",
    "                            normalize = False,\n",
    "                            max_iter = 1000,\n",
    "                            selection=algorithm,\n",
    "                            tol=1e-10)\n",
    "\n",
    "elastic_cuml.fit(X_cudf, y_cudf)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "predict_elas_cuml = elastic_cuml.predict(X_cudf_test).to_array()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%%time\n",
    "r2_score_elas_cuml = r2_score(y_cudf_test, predict_elas_cuml)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Compare Results"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"R^2 score (SKL):  %s\" % r2_score_elas_sk)\n",
    "print(\"R^2 score (cuML): %s\" % r2_score_elas_cuml)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
